Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information
نویسندگان
چکیده
MOTIVATION Human single nucleotide polymorphisms (SNPs) are the most frequent type of genetic variation in human population. One of the most important goals of SNP projects is to understand which human genotype variations are related to Mendelian and complex diseases. Great interest is focused on non-synonymous coding SNPs (nsSNPs) that are responsible of protein single point mutation. nsSNPs can be neutral or disease associated. It is known that the mutation of only one residue in a protein sequence can be related to a number of pathological conditions of dramatic social impact such as Alzheimer's, Parkinson's and Creutzfeldt-Jakob's diseases. The quality and completeness of presently available SNPs databases allows the application of machine learning techniques to predict the insurgence of human diseases due to single point protein mutation starting from the protein sequence. RESULTS In this paper, we develop a method based on support vector machines (SVMs) that starting from the protein sequence information can predict whether a new phenotype derived from a nsSNP can be related to a genetic disease in humans. Using a dataset of 21 185 single point mutations, 61% of which are disease-related, out of 3587 proteins, we show that our predictor can reach more than 74% accuracy in the specific task of predicting whether a single point mutation can be disease related or not. Our method, although based on less information, outperforms other web-available predictors implementing different approaches. AVAILABILITY A beta version of the web tool is available at http://gpcr.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi
منابع مشابه
Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.
Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited the...
متن کاملDesigning an intelligent system for predicting chromosomal genetic diseases using data mining
Background and Aim: Today we are witnessing tremendous advances in medical data mining. The data, by analyzing and discovering the relationships between them, can lead to algorithms that help us prevent or treat many diseases. Meanwhile, genetic diseases have attracted a large part of the attention of the medical world because the birth of children with genetic disorders imposes a great financi...
متن کاملPredicting cardiac arrhythmia on ECG signal using an ensemble of optimal multicore support vector machines
The use of artificial intelligence in the process of diagnosing heart disease has been considered by researchers for many years. In this paper, an efficient method for selecting appropriate features extracted from electrocardiogram (ECG) signals, based on a genetic algorithm for use in an ensemble multi-kernel support vector machine classifiers, each of which is based on an optimized genetic al...
متن کاملPrediction of function changes associated with single-point protein mutations using support vector machines (SVMs).
Computational methods can be used to predict the effects of single amino acid substitutions (single-point mutations). In contrast to previous methods that need many protein sequence and structural features, we applied support vector machines (SVMs) to predict protein function changes associated with amino acid substitutions using only sequence information, and cross-validated them on a large da...
متن کاملPredicting protein stability changes from sequences using support vector machines
MOTIVATION The prediction of protein stability change upon mutations is key to understanding protein folding and misfolding. At present, methods are available to predict stability changes only when the atomic structure of the protein is available. Methods addressing the same task starting from the protein sequence are, however, necessary in order to complete genome annotation, especially in rel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 22 22 شماره
صفحات -
تاریخ انتشار 2006